Learning Comprehensive Motion Representation for Action Recognition
نویسندگان
چکیده
For action recognition learning, 2D CNN-based methods are efficient but may yield redundant features due to applying the same convolution kernel each frame. Recent efforts attempt capture motion information by establishing inter-frame connections while still suffering limited temporal receptive field or high latency. Moreover, feature enhancement is often only performed channel space dimension in recognition. To address these issues, we first devise a Channel-wise Motion Enhancement (CME) module adaptively emphasize channels related dynamic with channel-wise gate vector. The gates generated CME incorporate from all other frames video. We further propose Spatial-wise (SME) focus on regions critical target motion, according point-to-point similarity between adjacent maps. intuition that change of background typically slower than area. Both and SME have clear physical meaning capturing clues. By integrating two modules into off-the-shelf network, finally obtain Comprehensive Representation (CMR) learning method for recognition, which achieves competitive performance Something-Something V1 & V2 Kinetics-400. On reasoning datasets V2, our outperforms current state-of-the-art 2.3% 1.9% when using 16 as input, respectively.
منابع مشابه
ActionFlowNet: Learning Motion Representation for Action Recognition
Even with the recent advances in convolutional neural networks (CNN) in various visual recognition tasks, the state-of-the-art action recognition system still relies on hand crafted motion feature such as optical flow to achieve the best performance. We propose a multitask learning model ActionFlowNet to train a single stream network directly from raw pixels to jointly estimate optical flow whi...
متن کاملA Comprehensive Review on Handcrafted and Learning-Based Action Representation Approaches for Human Activity Recognition
Human activity recognition (HAR) is an important research area in the fields of human perception and computer vision due to its wide range of applications. These applications include: intelligent video surveillance, ambient assisted living, human computer interaction, human-robot interaction, entertainment, and intelligent driving. Recently, with the emergence and successful deployment of deep ...
متن کاملScale-Based Human Motion Representation for Action Recognition
The design of action recognition algorithms often relies on knowledge of the particular problem, which is not always available. Moreover, algorithms usually incorporate a number of parameters, which influence their performance. To solve these problems, we explore the possibility of developing more general action recognition algorithms by systematic reduction of complexity of human motion, inste...
متن کاملMotion Context: A New Representation for Human Action Recognition
One of the key challenges in human action recognition from video sequences is how to model an action sufficiently. Therefore, in this paper we propose a novel motion-based representation called Motion Context (MC), which is insensitive to the scale and direction of an action, by employing image representation techniques. A MC captures the distribution of the motion words (MWs) over relative loc...
متن کاملEnd-to-end Video-level Representation Learning for Action Recognition
From the frame/clip-level feature learning to the videolevel representation building, deep learning methods in action recognition have developed rapidly in recent years. However, current methods suffer from the confusion caused by partial observation training, or without end-to-end learning, or restricted to single temporal scale modeling and so on. In this paper, we build upon two-stream ConvN...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i4.16400